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Abstract 

This paper regards randomized discrete-time consensus systems that preserve the average on 
expectation. As a main result, we provide an upper bound on the mean square deviation of the 
consensus value from the initial average. Then, we particularize our result to systems where the 
interactions which take place simultaneously are few, or weakly correlated; these assumptions 
cover several algorithms proposed in the literature. For such systems we show that, when the 
system size grows, the deviation tends to zero, not slower than the inverse of the size. Our 
results are based on a new approach, unrelated to the convergence properties of the system: this 
independence questions the relevance in this context of the spectral properties of the matrices 
related to the graph of possible interactions, which have appeared in some previous results. 

1 Introduction 

In modern control and signal processing applications, effective and easy-to-implement distributed 
algorithms for computing averages are an important tool. As a significant and motivating example, we 
recall that, in estimation problems, the law of large numbers allows using the average as the estimator 
of an expectation. In that context, the average is an unbiased estimator, and its mean square error 
decreases as the inverse of the number of samples. In a distributed setting, the sample values are 
available at the nodes of a communication network, and the average needs to be approximated by 
running an iterative consensus system having the sample data as the initial condition. Clearly one 
has to ensure that along the iterations of the consensus system, little (or no) deviation from the 
correct average is introduced. This requirement can be achieved by a local symmetry assumption 
or by some global constraints on the update, which may be difficult to enforce when updates are 
performed asynchronously, possibly following a random scheme. Then, a weaker requirement for 
stochastic updates is preserving the average on expectation: such systems are known to converge to 
consensus under mild conditions, but their consensus value is in general different from the average. 

In this paper, we consider linear randomized asynchronous averaging algorithms, and we analyze 
the mean square deviation of the consensus value from the initial average. We want to ensure that 
this error is small, so that averages can be effectively computed. In particular, we aim at providing 
conditions under which the mean square error tends to zero when the number of samples, i.e. the 
number of nodes, grows. We will refer to this property as to the asymptotical accuracy of the algorithm. 

Literature review 

The opportunity of using randomized systems to compute averages has already attracted a significant 
interest, as testified by recent surveys and special issues [11, 4]. Convergence theories for randomized 
linear averaging algorithms have been developed by Fagnani and Zampieri [9] and with more generality 
by Tahbaz-Salehi and Jadbabaie [12] and by Matei and Baras [10]. As we will formally define later, 
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random linear averaging algorithms consist in multiplying the node-indexed state by a random update 
matrix. In principle, the variance of the consensus value can be exactly computed by the formula in [12, 
Eq. (7)], which involves the dominant eigenvectors of the first two moments of the update matrix. 
Unfortunately, little is known about these eigenvectors, and in particular explicit formulas are not 
available, so that these results are difficult to apply. A few papers, on the other hand, have focused on 
specific examples of randomized algorithms, obtaining results which are interesting, although partial, 
from our perspective [8, 2, 6, 7]. Typically, these results are obtained as a by-product of a convergence 
analysis and involve the eigenvalues of the update matrices, which are fairly well known for many 
families of communication graphs. We come back to these results in Section 3 when discussing the 
examples. 

Contribution 

In this paper, we consider discrete-time consensus systems with random updates that preserve the 
average in expectation, and we provide new bounds on the mean square deviation of the current 
average from the initial average. We show that under certain conditions the expected increase of the 
deviation is bounded proportionally to the expected decrease of the disagreement. This approach 
leads to bounds on the total deviation which are proportional to the initial disagreement and, unlike 
previous results, are actually independent of the convergence properties: indeed they hold at every time 
regardless of convergence. Compared to those already available in the literature, our bounds typically 
result in less conservative estimates of the deviation error and, remarkably, they are independent of the 
global properties of the communication network, like connectivity or graph spectrum and eigensystem. 
Instead, only local network properties, like the degree, play a role in the examples. By contrast, we 
recall that results in the literature about convergence, and speed of convergence, depend on global 
network properties. Our estimates show that deviation tends to zero when the number of nodes grows 
under weak assumptions on the update law, in particular for 

i) systems where few updates take place simultaneously; and 

ii) systems where the updates have small statistical dependence across the network. 

Thanks to their generality and to their dependence on only local network properties, our results offer 
effective and easy-to-implement guidelines to the designer who needs to choose a network and an 
algorithm to solve an estimation problem. 

Notation and preliminaries 

In this work, we use the notion of (weighted directed) graph, which we define as a pair G = (I, A), 
where / is a finite set whose elements are called nodes and A e R lxl is a matrix with nonnegative 
entries. Resorting to more standard graph-theoretic jargon, we may equivalently think of an implicit 
edge set E = {(i, j) £ I x I : Aij > 0} and say that i is connected to j when A^ > 0. For simplicity, 
we will sometimes assume that a graph may have no loops, that is An = for every i g I. The 
column- degree df l of i is the number of off-diagonal positive elements in the i-th column of A, i.e. the 
cardinality of {j ^ i : > 0}. The graph is said to be strongly connected if for every node i and j, 
there exists a sequence i = io, ii, . . . , i p = j of nodes such that A iktik+1 > for k = 0, . . . ,p — 1. Given 
a graph, that is, a nonnegative matrix A, we can define an associated Laplacian matrix L(A) e R lxl 
as the matrix such that [L(A)]ij = —A^ if i ^ j and [L(A)]n = X^-j^i ^-ij- Observe that L(A) 
is positive semidcfinite and that L(A)1 — 0, provided we denote by 1 the vector of suitable size 
whose components are all 1. Besides, to any matrix L satisfying LI = with nonpositive off-diagonal 
elements, one can associate a corresponding weighted graph. 

2 Problem statement and main result 

Given a set of nodes / of finite cardinality N, we consider a distributed state x(t) € K 7 evolving 
according to a stochastic discrete-time system of the form 




for all i G /, t e Z> 



(1) 
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where for every i, j € /, we assume a,ij(t) to be a sequence of independent and identically distributed 
random variables such that ciij(t) > and X^e/ a n(t) = 1 f° r a U t > 0. System (1) is run with the 
goal for the state of each node to provide a good estimate of the initial average jj ^2 ieI Xi(0). Note 
that x(0) is unknown but given, and that all our results need to be valid for any x(0) € M. 1 . System (1) 
can also be conveniently rewritten as 

Xj(t+1) = Xj(t) + y^ j a i j(t)(xj(t) - Xj{t)) for alii e /, t e Z> , 
jei 

or in matrix form as 

x(t + 1) = x(t) - L(t)x(t) t£Z>„, (2) 

where we remind the reader that L%j{t) = —ctij(t) if i ^ j and Lu(t) = ^2j.j-n0,ij{t). Namely, 
L{t) is the Laplacian matrix of a weighted graph (I,A(t)) where the entries of A(t) are defined as 
= dij(t). The convergence of (2) has been addressed in the literature: the next proposition 
provides a handy sufficient condition. 

Proposition 1 (Convergence [9]). Consider system (2). If the graph induced by K[L(t)] is strongly 
connected and there exists i <G I such that almost surely Lu(t) < 1, then there exists a scalar random 
variable Xoo such that x{t) converges almost surely to x^l. 

Rather than in convergence, we are interested in the quality of the convergence value, in terms 
of its distance from the initial average: this issue is investigated in the rest of this paper. For our 
convenience, we denote the average of the Xi(t) by 

: =]^I>(*) 
iei 

and the second moment by x 2 {t) — ^2i x f(t)- The next result provides a necessary and sufficient 
condition for the average to be preserved in expectation, that is, for the process x{t) to be a martingale 
with respect to the filtration induced by x(t). 

Proposition 2 (Average preservation). Consider system (2) and let {J 7 t}tei. > o denote the filtration 
of a -algebras generated by the process x(t). Then, E[x(t + l)!^ 7 *] = x(t) if and only if l*E[L(t)} = 0. 

Proof. Since with our assumptions Lit) is independent from Ft, the result immediately follows from 
x{t + \) = x{t)-VL{t)x{t). □ 

In view of this result, we restrict our attention to systems that preserve the average on expectation, 
that is we will assume l*E[L(f)] = 0, implying that E[x(t)] = x(0) for all t > 0. Consequently, we 
are left with the problem of studying the variance of x(t), that is E[(x(t) — a;(0)) 2 ]. We will derive 
all our bounds from the following general result, which establishes that, under some conditions, the 
increase of the deviation is bounded proportionally to the decrease of the disagreement. 

Theorem 3 (Accuracy condition). Let x be an evolution of system (2), and denote 

n*) = -^EM*)-*(*)) 2 - 

i 

If l*E[L(t)] = and there exists 7 > such that 

E[L(s)*ll*L(*)] < 7 E[L(s) + Lis)* - L{s)*L{s)], (3) 
then for every t > 0, there holds 

E[(i(t) - x(0)) 2 ] < ^ (^(0) - E[5*(t)]) < 1V(0). 
If moreover the system converges to consensus, then 

E[i Xoo -xiO)) 2 ] <jjV(0). 
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Proof. We compute the increase of the deviation up to time t 



E 



(x(t)-my 



=E 



n-i 



\s=0 J 

we condition upon the filtration generated by x(t) 

t-i 

= Y,^\^[{x{s + l)-x(s)) 2 \F s 

t-l s-1 

+ 2 Yl E t E K s ( s + x ( 3 )) ( x ( u + !) - 1^]] 

s= o u =o 

and since E [(x(s + 1) — [x{u + 1) — IJ 7 ,,] = for u < s by Proposition 2, we obtain 

t-i 

= ^e[e[(x(s + 1)-x(.s)) 2 |J- s 

and finally since E[x(s + l)a;(s)|J" s ] = x(s) 2 

t-i 

= ^E[E [x{ S + l) 2 -x(sf\F s ]\ 



Then, we wish to study the increase in the squared average at each iteration. To this goal, a 
straightforward manipulation gives the following useful relations, which are valid for every s > 0: 

N (^{s + 1) - = -x(s)* [L{s) + L(s)* - L(s)*L(s)} x(s) 

N 2 (x 2 (s + 1) - x 2 (s)) = (l*L(s)x(s)) 2 - 2 l*x(s) l*L(s)x(s). 
By taking the conditional expectation, we also obtain that 



NE 



x 2 (s + 1) - x 2 (s)\J r s = -x(s)*E [L{s) + L(s)* - L(s)*L(s)} x(s) 



N 2 E [x 2 (s + 1) - x 2 (s)\ Ts] = x(s)*E [L(s)*ll* L(s)] x(s). 

These two formulas and (3) allow us to relate the change in the square of the average with the 
change in the average of the squares, implying that 

E [x 2 {s + 1) - x 2 (s)\T s ] < ^E p2(s) - X s (a + l)\F s 

and consequently the increase in the deviation is upper bounded as follows: 

t-i 

E (x(t) - x{0)f < ^ [e \x*(s) - ^{s + 



s=0 



s=0 



' N 



(.T 2 (0)-E[a; 2 (t)]) 



<-^x 2 (0). 

Observe now that the algorithm is invariant under translation (addition of a constant to each compo- 
nent of the state). By applying the last inequality to x(t) = x(t) — x(0), we obtain 



e ( X (t) - x(o)f = mm - mf < ^ 2 (o) = % ( ^ 



^(^(0)-x(0)) 2 ^ =%V(0). 



□ 
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3 Applications and examples 

In this section we see classes of systems of type (2) for which we can apply Theorem 3, that is, we 
can find 7 satisfying (3). Before presenting these example systems, we prove in the next subsection a 
general lemma which simplifies the search for 7. 

3.1 Bounds on deterministic and stochastic Laplacians 

The idea of the proofs of our results will always be to bound E(£*11*L) and E(L*L) in terms of 
E(L + L*). In this purpose, the following result will be useful. 

Lemma 4 (Laplacian bounds) . Let L be the Laplacian of a weighted directed graph, and a r max > be 
such that 

a ij ^ CM all i. 

(i) Ifl*L = 0, then 

L*L<a r max (L + L*). (4) 
Let now L be a random variable such that the upper bound a^ ax is valid almost surely, 
(ii) If 1*E(L) = 7 then 

E(L*L)<a I max E(L + L*). (5) 
(Hi) If 1*E(L) = and moreover there exists (3 > such that 

E(L*11*L) < (3E{L + L*), 

then E[L*11*L] < jE[L + L* - L*L] holds for any 7 > — t . 

^ — "max 

Before the proof, we need the following preliminary lemma. 
Lemma 5. Suppose that the coefficients ci, . . . , c m are nonnegative. Then, there holds 



\i=l I \i=l / i=l 



CiZf 



Proof. Let u,v e R m be defined by Ui — ^fcf and Vi — ^fcizi. It follows from Cauchy-Schwartz 
inequality that 



2 



□ 

Proof of Lemma 4- Along this proof let x e R 1 be arbitrary but fixed. To prove claim (i), we note 
that (Lx)i — J2j a ij{xi — Xj). Therefore, 

x* L* Lx = E] I E/ ai A x i ~ x i) 
For every i, since a r max > J2j-j^i a iji ^ follows then from Lemma 5 that 

E a^fa -Xi)\ < a T max ( x i - x if ■ 
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and by summing on i that 

x*L*Lx < a max a ^ Xi ~ x rf- ( 6 ) 

i j:j=£i 

Also note that 

^ ^ a ij( x j ~ ^i) 2 — ^ ^ a ij — 2 ^ ^ ClijXiXj + ^ ^ Ojj. 

Since l*L = 0, we have X^i^y a *j ~~ ^"Vj't't ' Therefore, a relabeling of the third term leads to 

where we have used x*L*x = (x*Lx)* = x*Lx. Using (6), statement (i) follows. 
We now turn to prove statement (ii). It follows from (6) that 



x*E(L*L)x = E(x*L*Lx) < E 



Y Y E ( a ij)( X i ~ X j) 2 ■ 



Since E(L) is a (deterministic) Laplacian and 1*E(L) = 0, we can apply the same argument leading 
to (4) in order to argue that 

x*E(L*L)x < a max a;*E(L + L*)x, 

which implies (5). Finally, we prove the last claim (hi). It follows from (5) that — a max E(L + L*) < 
—E(L*L). Therefore, the existence of /? implies that for any 7 > j—^ — , there holds 

E(L*11*L) < /3E(L + L*) < 7 E(L + L*) - 7 a max E(L + L*) < 7 E (L + L* - L*L) . 

□ 

The quantity 1 — a^ ax appearing in Lemma 4 (hi) actually corresponds to a lower bound on the 
"self-confidence" au{t) = 1 — J2jei a ij(t) °f tne nodes. For a constant /?, the bound on the mean 
square error is thus inversely proportional to the minimal self-confidence. This is consistent with the 
intuition that, when au(t) is very small, the information held by some nodes may be almost entirely 
"forgotten" in one iteration, possibly resulting in large variations of the average. 

3.2 Limited simultaneous updates 

In this section, we show that a suitable 7 to satisfy the condition in Theorem 3 can be found when 
the number, or at least the contribution, of the simultaneous updates is small. The next result has 
the following interpretation: the mean square deviation can be bounded proportionally to the ratio 
between "strength" of the interactions in the system and "self-confidence" of each node. Note that 
from now on, when studying the evolution of system (2), we will for brevity forget to write the 
dependence on time of the random variables and L, whenever this causes no confusion. 

Theorem 6 (Limited updates). Consider system (2) and let a^ ax and a max be two positive constants 
such that almost surely X^-y^i a-ij < i an d "Ylij-j^i a ij — "max f or a ^ i € /. If 1*E(L) = 0, 
then the condition of Theorem 3 holds for all 

all 

7 > " ,I!X 



1 — a 1 

1 "max 



Proof. It follows from Lemma 5 that 



c*L*ll*Lx = ( S ai ^ x i ~ Xi H ^ a ra^Y Y a ^ ( X J - X *Y 
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Therefore, 

E (x'L'lVLx) < <' ax ]T E (*i - x ^ = <L^HL + L*)x, 

where we have used Lemma 4(i), so that E(L*11*L) < a^J 1 ax (L + L*). The result follows then from 
Lemma 4(iii). □ 

Theorem 6 can be applied to several particular cases involving small number of edges or small 
interactions: we discuss here two of them, drawn from the literature. The following is the simplest 
example of a randomized averaging algorithm. 

Example 1 (Asynchronous Asymmetric Gossip Algorithm (AAGA)). Let a graph G = (I, W) and 
q 6 (0,1) be given, such that 1*VF1 = 1. For every t > 0, one edge is sampled from a 

distribution such that the probability of selecting (i, j) is Wij. Then, Xi(t + 1) = (1 — q) Xi(t) +qxj(t) 
and Xk(t + 1) = Xk(t) for k ^ i. 

We note that Proposition 1 applies to Example 1, as well as to all following examples in this paper. 

Corollary 7 (AAGA is asymptotically accurate). Consider the AAGA system (2) of Example 1 and 
assume that and Wl = W*l. Then Theorem 3 holds for any 7 > j^. 

Proof. Note that E[L(t)} = qL(W); then l*E{L(t)] = by the assumption on W. To apply Theorem 6 
we observe that a^| ax = a T max = q since only one node is sending her state to another. □ 

This result implies that the expected deviation is not larger than — — 1^(0): extensive numerical 

simulations indicate that this bound captures the correct dependence on the size 1 . We summarize 
this evidence in Figure 1 by taking two exemplary sequences of graphs: ring graphs and de Bruijn 
graphs. We recall their definitions and significant properties in the next example. 

Example 2 (Ring and de Bruijn graphs). A graph G — ({0, . . . , N — 1}, A) is a ring when A is such 
that for every i, it holds A t j > if and only if either j = i + 1 or j = i — 1 (modulo N). Instead, G is 
a de Bruijn graph on n symbols of dimension k when N — n k nodes and every node i is connected to 
ni, ni + 1 , ni + 2, . . . , ni + n — 1 (all modulo n k ) . Their global connectivity properties are very different , 
especially in large networks: rings are "poorly" connected, whereas de Bruijn graphs are "very well" 
connected. This difference can be highlighted by looking at the spectra of their Laplacians. Assuming 
for simplicity that A e {0, 1} /X/ , we compute the smallest eigenvalue of the Laplacian, denoted by 
Ai. For rings, Ai = 2 — 2 cos (^) < ^3-, which goes to zero as N grows. For de Bruijn graphs, Ai 
is equal to n. Then, we can consider a sequence of de Bruijn graphs with n — 2 and increasing size 
2 k , for HN. On such a sequence, Ai does not go to zero as N grows, and the degree is not larger 
than 2, which is the degree of the ring (and is actually exactly 2 for all nodes if self- loops are taken 
into account). The notable spectral properties of these two sequences are also discussed in [3], in 
connection with the convergence speed of averaging algorithms. 

The next example, which applies very naturally to wireless networks, has attracted a significant 
attention [9, 2]. 

Example 3 (Broadcast Gossip Algorithm (BGA)). Let q e (0,1) a graph G = (I, W) be given, 
such that W € {0, l} 7x/ . For every t > 0, one node j is sampled from a uniform distribution over 
/. Then, Xi(t + 1) = (1 — q)xi{t) + qxj(t) if Wij > and Xi(t + 1) = Xi(t) otherwise. In other 
words, one randomly selected node broadcasts her value to all her neighbors, which update their 
values accordingly. 

A few partial results about the mean square deviation of this algorithm are available in the liter- 
ature: we provide a summary in the following remark. 

lr The AAGA system is also studied in [8, Section 4], where it is proved that the mean square deviation of the limit 

value is not larger than -r^ — r^r 4? VY0). We note that although the authors assume that the initial condition is a random 
l-q+-fr N 

variable, their results on the deviation actually remain valid for arbitrary x(0). Our bound from Corollary 7 is slightly 
larger, but asymptotically equivalent. 
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AAGA 



BGA 




Figure 1: Logarithmic plots of the simulated mean square deviation E[(xoo — a;(0)) ] against the graph 
size JV, for AAGA and BGA systems with q = 0.5 running over ring graphs and de Bruijn graphs on 
2 symbols. See text for precise graph definitions. Convergence to consensus is approximated by the 
condition iV _1 ||x(t) — 5c (t) 1 1 ] § < 10 -4 and expectation is simulated by averaging over 500 runs. Each 
initial condition is sampled from independent uniform distributions over [0, 1]. 



Remark 1 (Earlier results on BGA). Previous results about the deviation of BGA are dependent on 
the topology of the network. In [2, Proposition 3] it is proved that the mean square deviation is upper 

bounded by V(0) ( 1 — - — — J , where Aj is the i-th smallest non-zero eigenvalue of the 



AiV-l 1 - 2f^jV-l . 

Laplacian of the graph G. This bound, however, does not imply that the deviation goes to zero as 

q d 2 

N grows. In [6, Proposition 3.3] the authors obtain the upper bound 2F(0) 1 * where d max 

1 — q N\\ 

is the maximum degree of the graph. We can see that on a ring graph -^y > M^: in such a case, 
one can not argue that the deviation goes to zero. Although for rings and other sequences of graphs 
asymptotical accuracy is shown in [7] , using Markov chain theory results from [5] , a general proof of 
accuracy is not available in the literature. □ 

Simulations show that the global topology of the graph plays a limited role: as a demonstration, 
we plot in Figure 1 the results for ring and de Bruijn graphs. Based on similar simulations, it was 
conjectured in [6] that the mean square error of the BGA is proportional to the ratio between the 
degree and the number of nodes. This fact can actually be proved by applying Theorem 6. 

Corollary 8 (BGA is asymptotically accurate). Consider the BGA system (2) of Example 3 and 
assume that Wl = W*l. Then Theorem 3 holds for any 7 > j^^maxj where <i™* x is the maximum 
column degree of the graph. 

Proof. Note that E[L(t)] = -§L(W); then l*E[L(t)} = by the assumption on W. To apply Theorem 6 
we observe that a T max = q and a^J' ax = qd^g x , since one node may send her value to at most g?^x 
neighbors. □ 

3.3 Uncorrelated updates 

In this section we show that a small 7 can still be found even if there are many simultaneous updates, 
provided that the correlation between the updates is sufficiently small. Indeed, the argument presented 
in the next remark shows that for an algorithm such that all entries a^- are uncorrelated, the conclusion 
of Theorem 3 holds. 

Remark 2 (No correlation implies accuracy). Observe that, using the convention that an = 0, we 
have 

( V 

x*L*ll*Lx = ^2 a,ij(xj-Xi)\ = ^2 aijdkiixj - Xi)(xi - x k ) (7) 
\'.<- ' '• • / i,j,k,lel 
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Suppose now that all uncorrelated, then 

E(x*L*U*Lx) = E («ii) E Kl)fe - Xi)(xi -x k )+J2 ( E (4) - E K) 2 ) (Xj - Xif- 

i,j,k,lel 

It follows from (7) and from 1*E(L) = that the first term is 0. Then, if ay < a™f x < 1 for every 
we obtain 

E{x*L*U*Lx) < a™t £ E(a tJ )( Xj - x t f = a^x*E(L + L*)x, 

where the last equality holds thanks to 1*E(L) = 0. Lemma 4(iii) implies then that Theorem 3 holds 
with 7 = = am ° x . □ 

Clearly, a strong assumption such as the absence of correlation between the ay is rarely met 
in practical algorithms. However, in this spirit we can look for results assuming a "small" degree 
of correlation. The first result assumes that the lines of L(t) are uncorrelated, corresponding to the 
absence of correlation between the update behavior of the different nodes. It implies in particular that 
any scheme (preserving the average x on expectaction) where nodes update their values independently 
and have a minimal self-confidence is asymptotically accurate. 

Theorem 9 (Uncorrelated updates). Consider system (2) and let a T max be a positive constant such 
that almost surely Xy-j^i a ij — a max f or a ^ i € I- Assume that is and dki are uncorrelated when 
i 7^ k. If 1*E(L) = 0, then the condition of Theorem 3 holds for any 

> "max 



1 "max 



Proof. To take advantage of the decorrelation assumption, we first analyze the expression E[L*(11* — 
In)L\- Let be the i-th row of L, i.e. L — (l* t% , l^,, ■ ■ ■ Jn,*)*- Our assumption implies that 
and lj y » are uncorrelated when i ^ j. Therefore, there holds 

E (L*(ll* - I N )L) = E I l t'h» I = El U E k' = E[L*](11* - I N )E[L]. 

Since 1*E(L) — 0, this implies that 

E (I,* (11* - I)L) = - (EL)* (EL) < 0. 

Therefore, using first this last inequality and then Lemma 4(ii), we obtain 

E{L*U*L) =E(L*(11* - I)L)+E(L*L) < E(L*L) < aJ nax E(L + L*). 

The result follows then from Lemma 4(iii). □ 

A natural example of uncorrelated updates is as follows. 

Example 4 (Synchronous Asymmetric Gossip Algorithm (SAGA)). Let q e (0, 1) and a graph 
G = (I, W) be given, such that Wl = 1. For every t > 0, and every i e I one edge (i,ji) is sampled 
from a distribution such that the probability of selecting (i,ji) is Wij t . Then, for every i £ I, 

Xi{t+1) = {l-q) x l {t) + qx n (t). 

In other words, every node chooses one neighbor, reads her value, and updates her own value accord- 
ingly. 

Previous results on SAGA are only able to guarantee asymptotical accuracy on certain sequences 
of graphs. 
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Remark 3 (Earlier results on SAGA). This algorithm is also studied in [8, Section 5], where the 
authors derive an upper bound on the deviation of the limit value. When W is symmetric, this 
bound is asymptotically equivalent, as N -> oo, to jj^^lv r^TWT ^W' where esr(PF) is the second- 
largest absolute value of the eigenvalues of W. This result fails to prove asymptotical accuracy for 
some sequences of graphs. For instance, on a ring graph with positive Wys equal to 1/2, we have 

V(0) > T^no). □ 



2N l-csr(W) _ 2N cos(^) V W ~ 1-q 4tt 

Simulations in Figure 2 suggest instead that asymptotical accuracy is a general property of SAGA: 
this fact is proved in the next result. 

Corollary 10 (SAGA is asymptotically accurate). Consider the SAGA system (2) of Example 4 and 
assume 1*W = 1*. Then Theorem 3 holds for any 7 > j^. 

Proof. Note that E[L(t)] = qL(W); then l*E{L(t)] = by the assumption on W. To apply Theorem 9 
we observe that a T max — q since every node receives exactly one value. □ 

A second theorem, which is a sense dual to Theorem 9, assumes that the columns of L(t) arc 
uncorrelated, corresponding to the fact that the transmission of information from one node to her 
neighbors is not correlated with the transmissions from other nodes. 

Theorem 11 (Uncorrelated transmissions). Consider system (2) and let a^ ax be a positive constant 
such that almost surely a^ ax > X^ i^j a ij f or a ^ 3 *= I- Assume that and a\a are uncorrelated if 
I 7^ j. If 1*E(L) = 0, then the condition of Theorem 3 holds for any 

7> 



1 a max 

Proof. Let A be such that L = diag(Al) — A. Let then K = diag(^4*l) - A*. One can verify that K 
is the Laplacian of the weighted graph obtained by reversing all edges in the graph encoded by L. In 
particular, its off-diagonal elements are non-positive, and Kl = diag(A*l)l — A*l = A*l — A*l = 0. 
Observe that 

1*K = l*diag(A*l) - 1*A* = 1*A - 1*A* = 1* l*diag(Al) = -1*L, (8) 

so that 1*E(K) = —1*E(L) = 0. Moreover, the rows of K are uncorrelated by construction, and 
a max ^ J2j-j^i kij — J2j-j^i a ji f° r an * an< ^ an realizations. By the argument of Theorem 9, we have 
thus 

E(K*ll*K)<a c maK E(K + K*) . (9) 
Now, (8) implies that K*11*K = L*1VL. And, since 1*EL = 0, there holds 

1*E(A) = l*diag(E(A)l) = 1*E(A*), 

so that E(A)1 = E(A*)1. As a result, 

E{K + K*) =E(diag(A*l)) - E(A*) + E(diag(A*l)) - E(A) 
=E(diag(Al)) - E(A*) + E(diag(Al)) - E(A) 
=E(L + L*). 

The inequality (9) implies thus E(L*11*L) < a^ ax E (L + L*), and the result follows from Lemma 4(iii). 

□ 

An example for the application of Theorem 11 is provided as follows. 

Example 5 (Reverse Synchronized Asymmetric Gossip Algorithm (RSAGA)). Let graph G — (I, W) 
be given, such that 1*W = 1*. At each time step, every node j sends her value to one neighbor ij, cho- 
sen with probability Wi j j ;. Every node i then updates her value to Xi(t + 1) = ^(tj + ^^.j^^^-^)- 

Xi(t)), where q £ (0, l/d™ ax ) and d™ ax is the maximum column degree of the graph. In other words, 
every node sends her value to one of her neighbors, and then updates her value using all the values 
that she has received. 



10 



SAGA RSAGA 




Figure 2: Logarithmic plots of the simulated mean square deviation E[(xoo — x(0)) ] against the 
graph size N, for SAGA and RSAGA systems with q — 0.5 and q = 0.25, respectively. Expectation is 
simulated by averaging over 1000 runs. 



Observe that the columns of the update matrix are uncorrelated in the RSAGA system, as every 
node chooses independently to whom she is going to send her value. Hence, Theorem 11 implies the 
following result. 

Corollary 12 (RSAGA is asymptotically accurate). Consider the RSAGA system of Example 5 and 
assume that Wl = 1. Theorem 3 holds for any 7 > 1 _ q q dco i ■ 

Proof. Note that E[L(i)] = qL(W); then l*E[L(t)] = by the assumption on W. To apply Theorem 11 
we observe that a^ ax = q since every node sends her value to exactly one other node, and a^ax = 
^max? since a node can receive up to d™^ values simultaneously. □ 

Observe that Corollaries 10 and 12 ensure that SAGA and RSAGA are asymptotically accurate, 
while an analysis based on the number of simultaneous updates (that is, on Theorem 6) would suggest 
values of 7 which are proportional to N and thus do not guarantee asymptotic accuracy. 



3.4 Simultaneous correlated updates 

We have seen in the previous subsections that asymptotically accurate systems are obtained when 
there are few simultaneous updates or when the updates are uncorrelated. We now present an example 
showing that systems with numerous and correlated updates may not be asymptotically accurate. 

Example 6 (A biased algorithm). Consider a system on N — 2n nodes. At every time step t, 
one selects a bijcction /( : {1, ...,71} — > {n + 1, ...,2ft}, i.e. a function f t such that, for every 
j € {n + 1, ... , 2n}, there exists a unique i = ff (j) G {1, . . . , n} for which ft(i) = j- At every time 
step, with a probability 1/2, every node i £ {1, . . . , n} updates its value by Xi(t + 1) = ^(xi(t) + 
x ft{i)(t)) while the nodes j € {n + 1, ...,2n} keep their values unchanged; otherwise, every node 
j € {n + 1, . . . , 2n} updates its value by Xj(t + 1) = ^(xj(t) + x^-i^ (t)), while the nodes i G 1, . . . , n 
keep their values unchanged. The system clearly preserves the average on expectation. Suppose now 
that xi(t) = X2{t) = ••• = x n (t) and x n +i(t) = x n+ 2{t) = ■■■ = X2n{t) hold for t = 0. Then, a 
recurrence argument shows that these equalities hold for all t. Let XA(t) and xs{t) be the common 
values of the first n nodes and of the latter n nodes respectively. One can verify that the evolution 
of xa(£) and xsit) and their (common) limit value x^ is actually independent of n. Since the initial 
variance is V(0) = 5^ ( x i(fy ~ — \ { x b(0) — xa(Q)) , the ratio between E(xoo — x(0)) 2 and 

V(0) is also independent of n. In fact, it can be computed that E(xoo — x(0)) 2 = ^V(O). The mean 
square error does thus not decrease when n grows. □ 
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However, one should not conclude that every system with unbounded and not strictly uncorrelated 
updates must not be asymptotically accurate. In particular, small mean square errors can still occur 
for systems where the updates follow some more complex probability law, presenting some partial 
correlations. An example is the following algorithm, which generalizes the BGA and has been proposed 
in [1]. 

Example 7 (Probabilistic Broadcast Gossip Algorithm (PBGA)). Let q e (0, 1) and G = (I, W). At 
each time step, one node j, sampled from a uniform distribution over /, broadcasts her current value. 
Every node i receives the value with a probability Wij <G [0, 1]. When a node i does receive the value 
from j, she updates her value to Xi(t + 1) = Xi(t) + q(xj(t) — Xi(t)). 

Proposition 13 (PBGA is asymptotically accurate). Assume that W = W* . Then, Theorem 3 holds 
with 

7> (^max + l)^, 

where W max = max ie/ £\ £J W lj . 

Proof. From [1, Lemma 2] we can quickly derive the following formulas for every t>0, 
E[L(t)} = ±L(W) 
E[L(t)*L(t)] = 2^L(W) 

2 2 2 

E[L(t)* 11* L(t)} = \L(W) 2 + 2^L(W) - 2^L(W ■ W), 

where W ■ W denotes cntrywise product. The assumption on W implies that l*E[L(t)] = 0, and in 
order to apply Theorem 3 we have to find 7 which satisfies the inequality 

q ^L(Wf + 2^L(W) - 2 q \{W ■W)< 1 (2±L(W) - 2^L(W) 

that is 

L(Wf - 2L(W ■ W) < 2 (j^p - L (W). 

Since any Laplacian -and in particular L(W ■ W)- is positive semidefinite, a sufficient condition for 
the previous inequality to hold is 



L{W) 2 < 2 (j^p - l) L (W). 



As Gershgorin's disk lemma implies that the spectral radius of L(W) is not larger than 2W max , the 
statement of the result follows. □ 



4 Conclusion and perspectives 

We have developed a new way of evaluating the mean square error of decentralized consensus protocols 
that preserve the average on expectation. Unlike previous approaches, which relied on the convergence 
speed of these systems, our results are based on the fact that the increase of the error can be bounded 
proportionally to the decrease of the disagreement. As such, they are independent of the speed at 
which the system converges, and therefore of the spectral properties of the network, which determines 
this speed. Notably, many of our bounds only involve local quantities such as the degree of the nodes or 
the weight that they give to their neighbors' values, as opposed to global ones such as the eigenvalues 
of the network Laplacian. As local quantities are much easier to control in distributed systems, our 
results are of immediate application in design. 

Our method can be applied to several known protocols: although we have sometimes been very 
conservative when deriving our bounds, our method provides bounds that are more accurate than 
those available in the literature in almost all cases treated, and closely match the experimental results 
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from algorithm simulations, capturing the qualitative dependence on the network size. Indeed our 
results ensure that, under mild conditions, distributed averaging can be performed via asymmetric 
and asynchronous algorithms, with a loss in the quality of the estimate which vanishes when increasing 
the number of samples (and nodes) . This fact strongly supports the application of such algorithms to 
large networks. 

In the interest of concision and simplicity, we have limited the number of possible particular cases 
of our results: there exists thus many possibilities of extending our results to more complex protocols. 
Overall, two classes of systems were proved to be asymptotically accurate: those with sufficiently few 
or small simultaneous updates, and those with sufficiently uncorrelated simultaneous updates. These 
two apparently complementary situations actually present strong similarities; remember indeed that 
the updates taking place at different times are assumed to be uncorrelated. This suggests that the 
real parameter determining the mean square error is the level of correlation between the updates 
taking place across the history of the system. Further work could be devoted to formalizing and 
quantifying this intuition on the importance of correlations between the updates. Finally, we note 
that the distribution of final values for processes that do not preserve the average on expectation has, 
to the best of our knowledge, not been studied so far. 
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